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Abstract 

It is now well understood that (1) it is possible to reconstruct sparse signals exactly from 
what appear to be highly incomplete sets of linear measurements and (2) that this can be 
done by constrained £i minimization. In this paper, we study a novel method for sparse signal 
recovery that in many situations outperforms £i minimization in the sense that substantially 
fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence 
of weighted f i-minimization problems where the weights used for the next iteration are computed 
from the value of the current solution. We present a series of experiments demonstrating the 
remarkable performance and broad applicability of this algorithm in the areas of sparse signal 
recovery, statistical estimation, error correction and image processing. Interestingly, superior 
gains are also achieved when our method is applied to recover signals with assumed near-sparsity 
in overcomplete representations — not by reweighting the £i norm of the coefficient sequence as is 
common, but by reweighting the £i norm of the transformed object. An immediate consequence 
is the possibility of highly efficient data acquisition protocols by improving on a technique known 
as compressed sensing. 

Keywords. £i-minimization, iterative reweighting, underdetermined systems of linear equa- 
tions, compressed sensing, the Dantzig selector, sparsity, FOCUSS. 

1 Introduction 



What makes some scientific or engineering problems at once interesting and chahenging is that 
often, one has fewer equations than unknowns. When the equations are linear, one would like to 
determine an object xq E M"- from data y = ^xq, where <I> is an m x n matrix with fewer rows than 
columns; i.e., m < n. The problem is of course that a system with fewer equations than unknowns 
usuahy has infinitely many solutions and thus, it is apparently impossible to identify which of these 
candidate solutions is indeed the "correct" one without some additional information. 

In many instances, however, the object we wish to recover is known to be structured in the 
sense that it is sparse or compressible. This means that the unknown object depends upon a 
smaller number of unknown parameters. In a biological experiment, one could measure changes of 
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expression in 30,000 genes and expect at most a couple hundred genes with a different expression 
level. In signal processing, one could sample or sense signals which are known to be sparse (or 
approximately so) when expressed in the correct basis. This premise radically changes the problem, 
making the search for solutions feasible since the simplest solution now tends to be the right one. 

Mathematically speaking and under sparsity assumptions, one would want to recover a sig- 
nal xq G M", e.g., the coefficient sequence of the signal in the appropriate basis, by solving the 
combinatorial optimization problem 

(Pq) min subject to y = ^x, (1) 

where \\x\\ig = \{i : Xi ^ 0}|. This is a common sense approach which simply seeks the simplest 
explanation fitting the data. In fact, this method can recover sparse solutions even in situations in 
which m <^ n. Suppose for example that all sets of m columns of $ are in general position. Then 
the program (Pq) perfectly recovers all sparse signals xq obeying HxqU^q < m/2. This is of little 
practical use, however, since the optimization problem ([T]) is nonconvex and generally impossible 
to solve as its solution usually requires an intractable combinatorial search. 
A common alternative is to consider the convex problem 

(Pi) min llxIL, subject to y = <I>x, (2) 

where \\x\\i^ = X^iLi Unlike (Pq), this problem is convex — it can actually be recast as a linear 
program — and is solved efficiently [1]. The programs (Pq) and (Pi) differ only in the choice of 
objective function, with the latter using an ^l norm as a proxy for the literal ^Q sparsity count. As 
summarized below, a recent body of work has shown that perhaps surprisingly, there are conditions 
guaranteeing a formal equivalence between the combinatorial problem (Pq) and its relaxation (Pi). 

The use of the norm as a sparsity-promoting functional traces back several decades. A leading 
early application was reflection seismology, in which a sparse reflection function (indicating mean- 
ingful changes between subsurface layers) was sought from bandlimited data. In 1973, Claerbout 
and Muir [2] first proposed the use of £i to deconvolve seismic traces. Over the next decade this 
idea was refined to better handle observation noise [3,4], and the sparsity-promoting nature of 
minimization was empirically confirmed. Rigorous results began to appear in the late-1980's, with 
Donoho and Stark [5] and Donoho and Logan [6] quantifying the ability to recover sparse refiec- 
tivity functions. The application areas for ii minimization began to broaden in the mid-1990's, as 
the LASSO algorithm [7] was proposed as a method in statistics for sparse model selection. Basis 
Pursuit [8] was proposed in computational harmonic analysis for extracting a sparse signal repre- 
sentation from highly overcomplete dictionaries, and a related technique known as total variation 
minimization was proposed in image processing [9,10]. 

Some examples of ^i type methods for sparse design in engineering include Vandenberghe et 
al. [11, 12] for designing sparse interconnect wiring, and Hassibi et al. [13] for designing sparse 
control system feedback gains. In [14], Dahleh and Diaz-Bobillo solve controller synthesis problems 
with an li criterion, and observe that the optimal closed-loop responses are sparse. Lobo et 
al. used Hi techniques to find sparse trades in portfolio optimization with fixed transaction costs 
in [15]. In [16], Ghosh and Boyd used ^i methods to design well connected sparse graphs; in [17], 
Sun et al. observe that optimizing the rates of a Markov process on a graph leads to sparsity. 
In [1, §6.5.4, §11.4.1], Boyd and Vandenberghe describe several problems involving £i methods for 
sparse solutions, including finding small subsets of mutually infeasible inequalities, and points that 



2 



violate few constraints. In a recent paper, Koh et al. used these ideas to carry out piecewise-linear 
trend analysis [18]. 

Over the last decade, the applications and understanding of ii minimization have continued to 
increase dramatically. Donoho and Huo [19] provided a more rigorous analysis of Basis Pursuit, and 
this work was extended and refined in subsequent years, see [20-22]. Much of the recent focus on 
£i minimization, however, has come in the emerging field of Compressive Sensing [23-25]. This is a 
setting where one wishes to recover a signal xq from a small number of compressive measurements 
y = ^xq. It has been shown that ii minimization allows recovery of sparse signals from remarkably 
few measurements [26,27]: supposing <I> is chosen randomly from a suitable distribution, then with 
very high probability, all sparse signals xq for which UxoU^g < m/a with a = 0{\og{n/m)) can be 
perfectly recovered by using (Pi). Moreover, it has been established [27] that Compressive Sensing 
is robust in the sense that ii minimization can deal very effectively (a) with only approximately 
sparse signals and (b) with measurement noise. The implications of these facts are quite far- 
reaching, with potential applications in data compression [24,28], digital photography [29], medical 
imaging [23,30], error correction [31,32], analog-to-digital conversion [33], sensor networks [34,35], 
and so on. (We will touch on some more concrete examples in Section [3l) 

The use of li regularization has become so widespread that it could arguably be considered the 
"modern least squares" . This raises the question of whether we can improve upon ii minimization? 
It is natural to ask, for example, whether a different (but perhaps again convex) alternative to io 
minimization might also find the correct solution, but with a lower measurement requirement than 
ii minimization. 

In this paper, we consider one such alternative, which aims to help rectify a key difference 
between the ii and Iq norms, namely, the dependence on magnitude: larger coefficients are penalized 
more heavily in the ii norm than smaller coefficients, unlike the more democratic penalization of the 
£q norm. To address this imbalance, we propose a weighted formulation of £i minimization designed 
to more democratically penalize nonzero coefficients. In Section O we discuss an iterative algorithm 
for constructing the appropriate weights, in which each iteration of the algorithm solves a convex 
optimization problem, whereas the overall algorithm does not. Instead, this iterative algorithm 
attempts to find a local minimum of a concave penalty function that more closely resembles the 
io norm. Finally, we would like to draw attention to the fact that each iteration of this algorithm 
simply requires solving one li minimization problem, and so the method can be implemented readily 
using existing software. 

In Section [3l we present a series of experiments demonstrating the superior performance and 
broad applicability of this algorithm, not only for recovery of sparse signals, but also pertaining 
to compressible signals, noisy measurements, error correction, and image processing. This section 
doubles as a brief tour of the applications of Compressive Sensing. In Section [H we demonstrate 
the promise of this method for efficient data acquisition. Finally, we conclude in Section [5] with a 
final discussion of related work and future directions. 
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2 An iterative algorithm for reweighted £i minimization 



2.1 Weighted ii minimization 

Consider the "weighted" ii minimization problem 

(WPi) min ^^Wjlxil subject to y = (3) 
1=1 

where wi,W2, . . . ,Wn are positive weights. Just hke its "unweighted" counterpart (Pi), this convex 
problem can be recast as a linear program. In the sequel, it will be convenient to denote the 
objective functional by where W is the diagonal matrix with wi,. . . ,Wn on the diagonal 

and zeros elsewhere. 

The weighted ii minimization (WPi) can be viewed as a relaxation of a weighted io minimization 
problem 

(WPo) min ||VFx||£o subject to y = ^x. (4) 

Whenever the solution to (Pq) is unique, it is also the unique solution to (WPq) provided that 
the weights do not vanish. However, the corresponding li relaxations (Pi) and (WPi) will have 
different solutions in general. Hence, one may think of the weights {wi) as free parameters in the 
convex relaxation, whose values — if set wisely — could improve the signal reconstruction. 

This raises the immediate question: what values for the weights will improve signal reconstruc- 
tion? One possible use for the weights could be to counteract the influence of the signal magnitude 
on the ii penalty function. Suppose, for example, that the weights were inversely proportional to 
the true signal magnitude, i.e., that 
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oo, XQ^i = 0. 



If the true signal xq is /c-sparse, i.e., obeys HxqH^ < k, then (WPi) is guaranteed to find the correct 
solution with this choice of weights, assuming only that m > k and that just as before, the columns 
of $ are in general position. The large (actually infinite) entries in Wi force the solution x to 
concentrate on the indices where Wi is small (actually finite), and by construction these correspond 
precisely to the indices where xq is nonzero. It is of course impossible to construct the precise 
weights ([5]) without knowing the signal xq itself, but this suggests more generally that large weights 
could be used to discourage nonzero entries in the recovered signal, while small weights could be 
used to encourage nonzero entries. 

For the sake of illustration, consider the simple 3-D example in Figure [H where xq = [0 1 0]^ 

and 

'211 
1 1 2 



We wish to recover xq from y = ^xq = [1 I]"'". Figure [I][a) shows the original signal xq, the set 
of points X G obeying ^x = ^xq = y, and the £i ball of radius 1 centered at the origin. The 
interior of the ii ball intersects the feasible set ^x = y, and thus (Pi) finds an incorrect solution, 
namely, x* = [1/3 1/3]-^ / xq (see Figure mb)). 

Consider now a hypothetical weighting matrix W = diag([3 1 3]^). Figure shows the 
"weighted ii ball" of radius = 1 centered at the origin. Compared to the unweighted ii 
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Figure 1: Weighting £i minimization to improve sparse signal recovery, (a) Sparse signal 
xq, feasible set = y, and £i hall of radius \\xo\\i-^. (h) There exists an x ^ xq for which 
< ||xo||fi. (c) Weighted l\ ball. There exists no x ^ xq for which \\Wx\\i^ < ||W^xo||^i. 



ball (Figure [U^a)), this ball has been sharply pinched at xq. As a result, the interior of the weighted 
£i ball does not intersect the feasible set, and consequently, (WPi) will find the correct solution 
X* = Xq. Indeed, it is not difficult to show that the same statements would hold true for any 
positive weighting matrix for which W2 < {wi + W3)/3. Hence there is a range of valid weights for 
which (WPi) will find the correct solution. As a rough rule of thumb, the weights should relate 
inversely to the true signal magnitudes. 



2.2 An iterative algorithm 

The question remains of how a valid set of weights may be obtained without first knowing xq. As 
Figure [1] shows, there may exist a range of favorable weighting matrices W for each fixed xq, which 
suggests the possibility of constructing a favorable set of weights based solely on an approximation 
X to Xq or on other side information about the vector magnitudes. 

We propose a simple iterative algorithm that alternates between estimating xq and redefining 
the weights. The algorithm is as follows: 

1. Set the iteration count £ to zero and wf^^ = 1, z = l,...,n. 

2. Solve the weighted £i minimization problem 

x^^) = argmin subject to y = ^x. 



3. Update the weights: for each i = 1, . . . ,n, 

\xl '\+e 

4. Terminate on convergence or when £ attains a specified maximum number of iterations -^max* 
Otherwise, increment £ and go to step 2. 
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We introduce the parameter e > in step 3 in order to provide stability and to ensure that a 
zero- valued component in x*^^-* does not strictly prohibit a nonzero estimate at the next step. As 
empirically demonstrated in Section [3l e should be set slightly smaller than the expected nonzero 
magnitudes of xq. In general, the recovery process tends to be reasonably robust to the choice of e. 

Using an iterative algorithm to construct the weights {wi) tends to allow for successively better 
estimation of the nonzero coefficient locations. Even though the early iterations may find inaccurate 
signal estimates, the largest signal coefficients are most likely to be identified as nonzero. Once 
these locations are identified, their influence is downweighted in order to allow more sensitivity for 
identifying the remaining small but nonzero signal coefficients. 

Figure [2] illustrates this dynamic by means of an example in sparse signal recovery. Figure [2)J^ a) 
shows the original signal of length n = 512, which contains 130 nonzero spikes. We collect m = 256 
measurements where the matrix <1> has independent standard normal entries. We set e = 0.1 
and imax = 2. Figures [2]^b)-(d) show scatter plots, coefficient-by-coefficient, of the original signal 
coefficient xq versus its reconstruction x^^\ In the unweighted iteration (Figure [2jb)), we see 
that all large coefficients in xq are properly identified as nonzero (with the correct sign), and that 
ll^o — a^^^'^ll^oo = 0.4857. In this first iteration, = 256 = m, with 15 nonzero spikes in 

Xq reconstructed as zeros and 141 zeros in xq reconstructed as nonzeros. These numbers improve 
after one reweighted iteration (Figure [2jc)) with now ||x — x^^^H^^ = 0.2407, Hx^^^H^p = 256 = 
m, 6 nonzero spikes in xq reconstructed as zeros and 132 zeros in xq reconstructed as nonzeros. 
This improved signal estimate is then sufficient to allow perfect recovery in the second reweighted 
iteration (Figure [2jd)). 



2.3 Analytical justification 

The iterative reweighted algorithm falls in the general class of MM algorithms, see [36] and ref- 
erences therein. In a nutshell, MM algorithms are more general than EM algorithms, and work 
by iteratively minimizing a simple surrogate function majorizing a given objective function. To 
establish this connection, consider the problem 

n 

min y log(|xj|+e) subject to y = ^x, (7) 



which is equivalent to 

Ey — ^x 

log(tii + e) subiect to i i ^ ' . (8) 
^,u,^^. . \xi\<Ui, ^ = l,...,n. 



1=1 



The equivalence means that if x* is a solution to ([7]) , then (x* , | x* | ) is a solution to ([SD . And 
conversely, if (x*,m*) is a solution to ([8]), then x* is a solution to d?]). 
Problem ([8]) is of the general form 

min g{v) subject to u G C, 

V 

where C is a convex set. In ([8]), the function g is concave and, therefore, below its tangent. Thus, 
one can improve on a guess v at the solution by minimizing a linearization of g around v. This 
simple observation yields the following MM algorithm: starting with v^^^ G C, inductively define 

= argmin g{v^^^) + Vg{v^^^) ■ {v - v^^^) subject to v e C. 
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(c) (d) 

Figure 2: Sparse signal recovery through reweighted £i iterations, (a) Original length n — 
512 signal xo with 130 spikes, (b) Scatter plot, coefEcient-by-coelEcient, of xq versus its 
reconstruction x'"^ using unweighted £i minimization, (c) Reconstruction z^^^ after the first 
reweighted iteration, (d) Reconstruction a;^^^ after the second reweighted iteration. 
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Each iterate is now the solution to a convex optimization problem. In the case dH) of interest, this 
gives 

(x(^+i),n(^+i))=argmin subject to fT5''' . , 

^^W^g \xi\<Ui, i = l,...,n, 



i=l 



which is of course equivalent to 



El I 
— subject to y = <Px. 



\x) '\+e 



One now recognizes our iterative algorithm. 

In two papers [37,38], Fazel et al. have considered the same reweighted ii minimization algorithm 
as in Section [2.21 first as a heuristic algorithm for applications in portfolio optimization [37], and 
second as a special case of an iterative algorithm for minimizing the rank of a matrix subject to 
convex constraints [38]. Using general theory, they argue that X^^^i log(|a;-^^ | + e) converges to 
a local minimum of g{x) = Y^i=i^og{\xi\ + e) (note that this not saying that the sequence (x^^-*) 
converges). Because the log-sum penalty function is concave, one cannot expect this algorithm to 
always find a global minimum. As a result, it is important to choose a suitable starting point for 
the algorithm. Like [38], we have suggested initializing with the solution to (Pi), the unweighted 
ii minimization. In practice we have found this to be an effective strategy. Further connections 
between our work and FOCUSS strategies are discussed at the end of the paper. 

The connection with the log-sum penalty function provides a basis for understanding why 
reweighted ii minimization can improve the recovery of sparse signals. In particular, the log- 
sum penalty function has the potential to be much more sparsity-encouraging than the ii norm. 
Consider, for example, three potential penalty functions for scalar magnitudes t: 

fo{t) = l{t^o}, Mt) = and fiog,e{t) log(l + 1*1/^), 

where the constant of proportionality is set such that /iog,e(l) = 1 = /o(l) = /i(l)) see Figure [3l 
The first (io-like) penalty function /o has infinite slope at t = 0, while its convex (^i-like) relaxation 
/i has unit slope at the origin. The concave penalty function f\og,t, however, has slope at the origin 
that grows roughly as 1/e when e — > 0. Like the Iq norm, this allows a relatively large penalty to 
be placed on small nonzero coefficients and more strongly encourages them to be set to zero. In 
fact, /iog,e(i) tends to fo{t) as e — > 0. Following this argument, it would appear that e should be set 
arbitrarily small, to most closely make the log-sum penalty resemble the io norm. Unfortunately, 
as e — > 0, it becomes more likely that the iterative reweighted ii algorithm will get stuck in an 
undesirable local minimum. As shown in Section [3l a cautious choice of e (slightly smaller than 
the expected nonzero magnitudes of x) provides the stability necessary to correct for inaccurate 
coefficient estimates while still improving upon the unweighted ii algorithm for sparse recovery. 



2.4 Variations 

One could imagine a variety of possible reweighting functions in place of ([6|). We have experimented 
with alternatives, including a binary (large/small) setting of Wi depending on the current guess. 
Though such alternatives occasionally provide superior reconstruction of sparse signals, we have 
found the rule ([6]) to perform well in a variety of experiments and applications. 
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Figure 3: At the origin, the canonical (q sparsity count fo{t) is better approximated by the 
log-sum penalty function /iog,e(i) than by the traditional convex £i relaxation fi{t). 

Alternatively, one can attempt to minimize a concave function other than the log-sum penalty. 
For instance, we may consider 

n 

9{x) = ^atan(|xi|/e) 
1=1 

in lieu of Y17=i + l^^il/^)- The function atan is bounded above and £o-like. If x is the current 
guess, this proposal updates the sequence of weights as Wi = l/{xj + e^). There are of course 
many possibilities of this nature and they tend to work well (sometimes better than the log-sum 
penalty). Because of space limitations, however, we will limit ourselves to empirical studies of the 
performance of the log-sum penalty, and leave the choice of other penalties for further research. 

2.5 Historical progression 

The development of the reweighted li algorithm has an interesting historical parallel with the use 
of Iteratively Reweighted Least Squares (IRLS) for robust statistical estimation [39-41]. Consider 
a regression problem Ax = b where the observation matrix A is over determined. It was noticed that 
standard least squares regression, in which one minimizes ||r||2 where r = Ax — b is the residual 
vector, lacked robustness vis a vis outliers. To defend against this, IRLS was proposed as an 
iterative method to minimize instead the objective 

min p{ri{x)), 

i 

where p(-) is a penalty function such as the ii norm [39,42]. This minimization can be accomplished 
by solving a sequence of weighted least-squares problems where the weights {wi} depend on the 
previous residual Wi = p'(ri)/ri. For typical choices of p this dependence is in fact inversely 
proportional — large residuals will be penalized less in the subsequent iteration and vice versa — 
as is the case with our reweighted ii algorithm. Interestingly, just as IRLS involved iteratively 
reweighting the ^2-iiorm in order to better approximate an £i-like criterion, our algorithm involves 
iteratively reweighting the ^i-norm in order to better approximate an £o-hke criterion. 

3 Numerical experiments 

We present a series of experiments demonstrating the benefits of reweighting the ii penalty. We 
will see that the requisite number of measurements to recover or approximate a signal is typi- 
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cally reduced, in some cases by a substantial amount. We also demonstrate that the reweighting 
approach is robust and broadly applicable, providing examples of sparse and compressible signal 
recovery, noise-aware recovery, model selection, error correction, and 2-dimensional total-variation 
minimization. Meanwhile, we address important issues such as how one can choose e wisely and 
how robust is the algorithm to this choice, and how many reweighting iterations are needed for 
convergence. 

3.1 Sparse signal recovery 

The purpose of this first experiment is to demonstrate (1) that reweighting reduces the necessary 
sampling rate for sparse signals (2) that this recovery is robust with respect to the choice of e 
and (3) that few reweighting iterations are typically needed in practice. The setup for each trial 
is as follows. We select a sparse signal xq of length n = 256 with ||xo||£o = ^- The k nonzero 
spike positions are chosen randomly, and the nonzero values are chosen randomly from a zero-mean 
unit-variance Gaussian distribution. We set m = 100 and sample a random m x n matrix <1> with 
i.i.d. Gaussian entries, giving the data y = ^xq. To recover the signal, we run several reweighting 
iterations with equality constraints (see Section 12. 2p . The parameter e remains fixed during these 
iterations. Finally, we run 500 trials for various fixed combinations of k and e. 

Figure [H^a) compares the performance of unweighted £i to reweighted £i for various values of 
the parameter e. The solid line plots the probability of perfect signal recovery (declared when 
||xo — < 10"'^) for the unweighted ii algorithm as a function of the sparsity level k. The 

dashed curves represent the performance after 4 reweighted iterations for several different values of 
the parameter e. We see a marked improvement over the unweighted ii algorithm; with the proper 
choice of e, the requisite oversampling factor m/k for perfect signal recovery has dropped from 
approximately 100/25 = 4 to approximately 100/33 ~ 3. This improvement is also fairly robust 
with respect to the choice of e, with a suitable rule being about 10% of the standard deviation of 
the nonzero signal coefficients. 

Figure m^b) shows the performance, with a fixed value of e = 0.1, of the reweighting algorithm 
for various numbers of reweighted iterations. We see that much of the benefit comes from the first 
few reweighting iterations, and so the added computational cost for improved signal recovery is 
quite moderate. 

3.2 Sparse and compressible signal recovery with adaptive choice of e 

We would like to confirm the benefits of reweighted ii minimization for compressible signal recovery 
and consider the situation when the parameter e is not provided in advance and must be estimated 
during reconstruction. We propose an experiment in which each trial is designed as follows. We 
sample a signal of length n = 256 from one of three types of distribution: (1) /c-sparse with 
i.i.d. Gaussian entries, (2) A:-sparse with i.i.d. symmetric Bernoulli ±1 entries, or (3) compressible, 
constructed by randomly permuting the sequence for a fixed p, applying random sign 

fiips, and normalizing so that ||xo||£^ = 1. We set m = 128 and sample a random m x n matrix 
$ with i.i.d. Gaussian entries. To recover the signal, we again solve a reweighted ii minimization 
with equality constraints y = ^xq = ^x. In this case, however, we adapt e at each iteration as a 
function of the current guess x^^'^; step 3 of the algorithm is modified as follows: 
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4 Reweighting Iterations 



Reweighting, e = 0.1 
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Sparsity k 



Sparsity k 



(a) 



(b) 



Figure 4: Sparse signal recovery from m = 100 random measurements of a length n = 256 
signal. The probability of successful recovery depends on the sparsity level k. The dashed 
curves represent a reweighted £i algorithm that outperforms the traditional unweighted ii 
approach (solid curve), (a) Performance after 4 reweighting iterations as a function of e. (b) 
Performance with fixed e = 0.1 as a function of the number of reweighting iterations. 

3. Let denote a reordering of in decreasing order of magnitude. Set 



Our motivation for choosing this value for e is based on the anticipated accuracy of ii minimization 
for arbitrary signal recovery. In general, the reconstruction quality afforded by £i minimization is 
comparable (approximately) to the best ZQ-term approximation to xq, and so we expect approxi- 
mately this many signal components to be approximately correct. Choosing the smallest of these 
gives us a rule of thumb for choosing e. 

We run 100 trials of the above experiment for each signal type. The results for the A;-sparse 
experiments are shown in Figure [5ja). The solid black line indicates the performance of unweighted 
ii recovery (success is declared when ||a;o — < 10~^). This curve is the same for both the 

Gaussian and Bernoulli coefficients, as the success or failure of unweighted ii minimization depends 
only on the support and sign pattern of the original sparse signal. The dashed curves indicate the 
performance of reweighted ii minimization for Gaussian coefficients (blue curve) and Bernoulli 
coefficients (red curve) with imax = 4. We see a substantial improvement for recovering sparse 
signals with Gaussian coefficients, yet we see only very slight improvement for recovering sparse 
signals with Bernoulli coefficients. This discrepancy likely occurs because the decay in the sparse 
Gaussian coefficients allows large coefficients to be easily identified and significantly downweighted 
early in the reweighting algorithm. With Bernoulli coefficients there is no such "low-hanging fruit" . 

The results for compressible signals are shown in Figure [5]^b),(c). Each plot represents a his- 
togram, over 100 trials, of the £2 reconstruction error improvement afforded by reweighting, namely, 
ll^o — x*^^) ll^j/llxo — x'^'') ll^j . We see the greatest improvements for smaller p corresponding to sparser 
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Figure 5: (a) Improvements in sparse signal recovery from reweighted ii minimization when 
compared to unweighted £i minimization (sohd black curve). The dashed blue curve corre- 
sponds to sparse signals with Gaussian coefhcients; the dashed red curve corresponds to sparse 
signals with Bernoulli coefficients. (b),(c) Improvements in compressible signal recovery from 
reweighted ii minimization when compared to unweighted £i minimization; signal coefhcients 
decay as n~^/P with (b) p = 0.4 and (c) p = 0.7. Histograms indicate the £2 reconstruction 
error improvements afforded by the reweighted algorithm. 



signals, with reductions in £2 reconstruction error up to 50% or more. As p — > 1, the improvements 
diminish. 

3.3 Recovery from noisy measurements 

Reweighting can be applied to a noise-aware version of £1 minimization, further improving the 
recovery of signals from noisy data. We observe y = $xo + z, where z is a noise term which is either 
stochastic or deterministic. To recover xq, we adapt quadratically-constrained £1 minimization [7, 
27], and modify step 2 of the reweighted £1 algorithm with equality constraints (see Section [2. 2p as 

x*-^^ = argmin subject to \\y — ^xWi^ < S. (9) 

The parameter 5 is adjusted so that the true vector xq be feasible (resp. feasible with high proba- 
bility) for ([9]) in the case where z is deterministic (resp. stochastic). 

To demonstrate how this proposal improves on plain £1 minimization, we sample a vector of 
length n = 256 from one of three types of distribution: (1) A;-sparse with k = 38 and i.i.d. Gaussian 
entries, (2) fe-sparse with A; = 38 and i.i.d. symmetric Bernoulli ±1 entries, or (3) compressible, 
constructed by randomly permuting the sequence for a fixed p, applying random sign 

flips, and normalizing so that Ha^olkoo ~ '^^^^ matrix $ is 128 x 256 with i.i.d. Gaussian entries 
whose columns are subsequently normalized, and the noise vector z is drawn from an i.i.d. Gaussian 
zero-mean distribution properly rescaled so that \\z\\i2 = PW^xWi^ with (3 = 0.2; i.e., z = ctzq where 
zq is standard white noise and a = (5\\(^x\\i^/\\zq\\£^. The parameter 5 is set to 5"^ = a'^{m + 2\/2m) 
as this provides a likely upper bound on H-zH^j- We set e to be the empirical maximum value of 
ll^*'^lkoo over several realizations of a random vector ^ ~ AA(0, a'^Im)- (This gives a rough estimate 
for the noise amplitude in the signal domain, and hence, a baseline above which significant signal 
components could be identified.) 

We run 100 trials for each signal type. Figure [6] shows histograms of the £2 reconstruction error 
improvement afforded by 9 iterations, i.e., each histogram documents H^o — x*^^) H^j/ll^o ~ ^^'^^Il£2 
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(a) 

Figure 6: Sparse and compressible signal reconstruction from noisy measurements. His- 
tograms indicate the £2 reconstruction error improvements afforded by tie reweighted 
quadratically-constrained £1 minimization for various signal types. 

over 100 trials. We see in these experiments that the reweighted quadratically-constrained ii 
minimization typically offers improvements ||rEo — x^^^ H^j/INo ~ 2;^'''' ||^2 range 0.5 — 1 in many 

examples. The results for sparse Gaussian spikes are slightly better than for sparse Bernoulli spikes, 
though both are generally favorable. Similar behavior holds for compressible signals, and we have 
observed that smaller values of p (sparser signals) allow the most improvement. 

3.4 Statistical estimation 

Reweighting also enhances statistical estimation as well. Suppose we observe y = ^xq + z, where 
$ is m X n with m < n, and z is a noise vector z ~ M{0, cr'^Im) drawn from an i.i.d. Gaussian 
zero-mean distribution, say. To estimate xq, we adapt the Dantzig selector [43] and modify step 2 
of the reweighted £1 algorithm as 

= argmin HVF^'^^xH^, subject to \\^*{y - ^x)\\i^ < 6. (10) 

Again 5 is a parameter making sure that the true unknown vector is feasible with high probability. 

To judge this proposal, we consider a sequence of experiments in which xq is of length n = 256 
with k = 8 nonzero entries in random positions. The nonzero entries of xq have i.i.d. entries ac- 
cording to the model Xi = Si{l + \ai\) where the sign Si = ±1 with probability 1/2 and aj ~ AA(0, 1). 
The matrix $ is 72 x 256 with i.i.d. Gaussian entries whose columns are subsequently normalized 
just as before. The noise vector (zi) has i.i.d. AA(0, o"^) components with a = l/3y^k/m ~ 0.11. 
The parameter 6 is set to be the empirical maximum value of over several realizations of 

a random vector z ~ M{0, a'^Im)- We set e = 0.1. 

After each iteration of the reweighted Dantzig selector, we also refine our estimate x^^-* using 
the Gauss-Dantzig technique to correct for a systematic bias [43]. Let I = {i : \x^^^\ > a ■ a} with 
a = 1/4. Then one substitutes x^^^ with the least squares estimate which solves 

min \\y — ^xllp^ subject to Xi = 0, i 4 I: 

that is, by regressing y onto the subset of columns indexed by /. 

We first report on one trial with ^max = 4. Figure[7U^a) shows the original signal xq along with the 
recovery x^^^ using the first (unweighted) Dantzig selector iteration; the error is ||xo — x^'^^ [[^j = 1.46. 
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Unweighted 
Gauss-Dantzig 


Reweighted 
Gauss-Dantzig 


Median error ratio 


2.43 


5.63 


Mean error ratio 


6.12 


1.21 


Avg. false positives 


3.25 


0.50 


Avg. correct detections 


7.86 


7.80 



Table 1: Model selection results for unweighted and reweighted versions of the Gauss-Dantzig 
estimator. In each of 5000 trials the true sparse model contains k = 8 nonzero entries. 



Figure [7Jb) shows the Dantzig selector recovery after 4 iterations; the error has decreased to 
||xo — 2;*''*^||f2 = 1.25. Figure [Tl^c) shows the Gauss-Dantzig estimate x^^^ obtained from the first 
(unweighted) Dantzig selector iteration; this decreases the error to ||xo — x^'^^H^j = 0.57. The 
estimator correctly includes all 8 positions at which xq is nonzero, but also incorrectly includes 4 
positions at which xq should be zero. In Figure Wid) we see, however, that all of these mistakes 
are rectified in the Gauss-Dantzig estimate x^^^ obtained from the reweighted Dantzig selector; the 
total error also decreases to ||xo — 2;(^)||£2 = 0.29. 

We repeat the above experiment across 5000 trials. Figure [8] shows a histogram of the ratio p'^ 
between the squared error loss of some estimate x and the ideal squared error 

P ■ v-^n ■ r 2 2\ 

Ei=imin(xo_i,a^) 

for both the unweighted and reweighted Gauss-Dantzig estimators. (The results are also summa- 
rized in Table[TJ) For an interpretation of the denominator, the ideal squared error ^ min(xQ j, cr^) 
is roughly the mean-squared error one could achieve if one had available an oracle supplying perfect 
information about which coordinates of xq are nonzero, and which are actually worth estimating. 
We see again a significant reduction in reconstruction error; the median value of decreases from 
2.43 to 1.21. As pointed out, a primary reason for this improvement comes from a more accurate 
identification of significant coefficients: on average the unweighted Gauss-Dantzig estimator in- 
cludes 3.2 "false positives," while the reweighted Gauss-Dantzig estimator includes only 0.5. Both 
algorithms correctly include all 8 nonzero positions in a large majority of trials. 
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Figure 7: Reweighting the Dantzig selector. Blue asterisks indicate the original signal xq; 
red circles indicate the recovered estimate, (a) Unweighted Dantzig selector, (b) Reweighted 
Dantzig selector, (c) Unweighted Gauss-Dantzig estimate, (d) Reweighted Gauss-Dantzig 
estimate. 
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Figure 8: Histogram of the ratio between the squared error loss and the ideal squared 
error for (a ) unweighted Gauss-Dantzig estimator and (b ) reweighted Gauss-Dantzig estimator. 
Approximately 5% of the tail of each histogram has been truncated for display; across 5000 
trials the maximum value observed was ~ 165. 

3.5 Error correction 

Suppose we wish to transmit a real- valued signal xq £ M", a block of n pieces of information, to 
a remote receiver. The vector xq is arbitrary and in particular, nonsparse. The difficulty is that 
errors occur upon transmission so that a fraction of the transmitted codeword may be corrupted 
in a completely arbitrary and unknown fashion. In this setup, the authors in [31] showed that 
one could transmit n pieces of information reliably by encoding the information as ^xq where 
$ S M™^", m > n, is a suitable coding matrix, and by solving 



upon receiving the corrupted codeword y = $xo + e; here, e is the unknown but sparse corruption 
pattern. The conclusion of [31] is then that the solution to this program recovers xq exactly provided 
that the fraction of errors is not too large. Continuing on our theme, one can also enhance the 
performance of this error-correction strategy, further increasing the number of corrupted entries 
that can be overcome. 

Select a vector of length n = 128 with elements drawn from a zero-mean unit-variance Gaussian 
distribution, and sample an mxn coding matrix <I> with i.i.d. Gaussian entries yielding the codeword 
For this experiment, m = 4n = 512, and k random entries of the codeword are corrupted with 
a sign flip. For the recovery, we simply use a reweighted version of (jlip . Our algorithm is as follows: 

1. Set £ = and wf'^ = 1 for i = 1, 2, . . . , m. 

2. Solve the weighted ii minimization problem 



min \\y — ^x\\£. 





(12) 
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Figure 9: Unweighted (solid curve) and reweighted (dashed curve) l\ signal recovery from 
corrupted measurements y ~ ^xq + e. The signal xq has length n = 128, the codeword y has 
size m — An ~ 512, and the number of corrupted entries \\e\\e„ — k. 



3. Update the weights; let = y — ^x^^^ and for each i = 1, . . . , m, define 





(13) 



4. Terminate on convergence or when £ attains a specified maximum number of iterations -^max* 
Otherwise, increment i and go to step 2. 

We set e to be some factor (3 times the standard deviation of the corrupted codeword y. We run 
100 trials for several values of /? and of the size k of the corruption pattern. 

Figure [9] shows the probability of perfect signal recovery (declared when ||xo — x\\i^ < 10^'^) 
for both the unweighted £i decoding algorithm and the reweighted versions for various values of (3 
(with imax = 4). Across a wide range of values (3 (and hence e), we see that reweighting increases 
the number of corrupted entries (as a percentage of the codeword size m) that can be overcome, 
from approximately 28% to 35%. 

3.6 Total variation minimization for sparse image gradients 

In a different direction, reweighting can also boost the performance of total-variation (TV) min- 
imization for recovering images with sparse gradients. Recall the total-variation norm of a 2- 
dimensional array (xij), 1 < i,j < n, defined as the ii norm of the magnitudes of the discrete 
gradient, 



where {Dx)ij is the 2-dimensional vector of forward differences {Dx)ij = (xi^ij — Xij , Xij^i — Xij) . 
Because many natural images have a sparse or nearly sparse gradient, it makes sense to search for 
the reconstruction with minimal TV norm, i.e.. 





l<i,j<n— 1 



min ||x||tv subject to y = ^x; 



(14) 
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see [9, 10], for example. It turns out that this problem can be recast as a second-order cone 
program [44], and thus solved efficiently. 

We adapt (jl4p by minimizing a sequence of weighted TV norms as follows: 

1. Set £ = and w^^] = I, 1 < i, j < n - 1. 

2. Solve the weighted TV minimization problem 

x^^) = argmin wf'j \\{Dx)ij\\, subject to y = ^x. 

l<i j<n— 1 



3. Update the weights; for each I < i, j < n — 1, 

4. Terminate on convergence or when i attains a specified maximum number of iterations -^max* 
Otherwise, increment i and go to step 2. 

Naturally, this iterative algorithm corresponds to minimizing a sequence of linearizations of the 
log-sum function X^kj j<„_i log(||(-Dx)jj|| + e) around the previous signal estimate. 

To show how this can enhance the performance of the recovery, consider the following exper- 
iment. Our test image is the Shepp-Logan phantom of size n = 256 x 256 (see Figure fTOT a)). 
The pixels take values between and 1, and the image has a nonzero gradient at 2184 pixels. We 
measure y by sampling the discrete Fourier transform of the phantom along 10 pseudo-radial lines 
(see Figure [TOl b)). That is, y = ^xq, where ^> represents a subset of the Fourier coefficients of xq. 
In total, we take m = 2521 real-valued measurements. 

Figure [TOTc) shows the result of the classical TV minimization, which gives a relative error 
equal to \\xq — x^^^\\e2/\\xQ\\e^ 0.43. As shown in Figure fTOT d). however, we see a substantial 
improvement after 6 iterations of reweighted TV minimization (we used 0.1 for the value of e). The 
recovery is near-perfect, with a relative error obeying ||xo — x'^^^H^j/UxoH^j ~ 2 x 10~^. 

For point of comparison it takes approximately 17 radial lines (m = 4257 real-valued measure- 
ments) to perfectly recover the phantom using unweighted TV minimization. Hence, with respect to 
the sparsity of the image gradient, we have reduced the requisite oversampling factor significantly, 
from IyII f« 1.95 down to |||^ 1.15. It is worth noting that comparable reconstruction perfor- 
mance on the phantom image has also been recently achieved by directly minimizing a nonconvex 
£p norm, p < 1, of the image gradient [45]; we discuss this approach further in Section [5.1[ 



4 Reweighted £i analysis 

In many problems, a signal may assume sparsity in a possibly overcomplete representation. To 
make things concrete, suppose we are given a dictionary ^ of waveforms (the columns 

of ^) which allows representing any signal as x = ^a. The representation a is deemed sparse 
when the vector of coefficients a has comparably few significant terms. In some applications, it 
may be natural to choose ^ as an orthonormal basis but in others, a sparse representation of the 
signal X may only become possible when ^' is a redundant dictionary; that is, it has more columns 
than rows. A good example is provided by an audio signal which often is sparsely represented as 
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(c) 




Figure 10: Image recovery from reweighted TV minimization, (a) Original 256 x 256 phantom 
image, (b) Fourier- domain sampling pattern, (c) Minimum-TV reconstruction; total variation 
= 1336. (d) Reweighted TV reconstruction; total variation (unweighted) = 1464. 



a superposition of waveforms of the general shape a~^/'^g{{t — to)/o')e*'^*, where Iq, uj, and a are 
discrete shift, modulation and scale parameters. 

In this setting, the common approach for sparsity-based recovery from linear measurements 
goes by the name of Basis Pursuit [8] and is of the form 

min \\a\\i-^ subject to y = $^'q; (16) 

that is, we seek a sparse set of coefficients a that synthesize the signal x = ^a. We call this 
synthesis-based i\ recovery. A far less common approach, however, seeks a signal x whose coefficients 
a = '^*x (when x is analyzed in the dictionary are sparse 

min||^*x||^^ subject to y = <^x. (17) 

We call this analysis-based ii recovery. When ^ is an orthonormal basis, these two programs are 
identical, but in general they find different solutions. When ^ is redundant, (jl7p involves fewer 
unknowns than (jl6p and may be computationally simpler to solve [46]. Moreover, in some cases 
the analysis-based reconstruction may in fact be superior, a phenomenon which is not very well 
understood; see [47] for some insights. 

Both programs are amenable to reweighting but what is interesting is the combination of 
analysis-based ii recovery and iterative reweighting which seems especially powerful. This section 
provides two typical examples. For completeness, the iterative reweighted £i-analysis algorithm is 
as follows: 

1. Set £ = and w^p = 1, j G J (J indexes the dictionary). 

2. Solve the weighted £i minimization problem 



X 



aigmin\\W^^^^*x\\e-^ subject to y = ^x. 



3. Put a(^) = ^'*x(^) and define 

4. Terminate on convergence or when ^ attains a specified maximum number of iterations 
Otherwise, increment ^ and go to step 2. 



19 



Iteration count I 





1 


2 


3 


4 


5 


6 


7 


Error 




0.460 


0.101 


0.038 


0.024 


0.022 


0.022 


0.022 


0.022 



Table 2: Relative £2 reconstruction error as a function of reweighting iteration for two-pulse 
signal reconstruction. 



4.1 Incoherent sampling of radar pulses 

Our first example is motivated by our own research focused on advancing devices for analog- 
to-digital conversion of high-bandwidth signals. To cut a long story short, standard analog-to- 
digital converter (ADC) technology implements the usual quantized Shannon representation; that 
is, the signal is uniformly sampled at or above the Nyquist rate. The hardware brick wall is 
that conventional analog-to-digital conversion technology is currently limited to sample rates on 
the order of IGHz, and hardware implementations of high precision Shannon-based conversion 
at substantially higher rates seem out of sight for decades to come. This is where the theory of 
compressive sensing becomes relevant. 

Whereas it may not be possible to digitize an analog signal at a very high rate rate, it may 
be quite possible to change its polarity at a high rate. The idea is then to multiply the signal by 
a pseudo-random sequence of plus and minus ones, integrate the product over time windows, and 
digitize the integral at the end of each time interval. This is a parallel architecture and one has 
several of these random multiplier-integrator pairs running in parallel using distinct or event nearly 
independent pseudo-random sign sequences. 

To show the promise of this approach, we take xq to be a 1-D signal of length n = 512 which 
is a superposition of two modulated pulses (see Figure fTTT a)). From this signal, we collect m = 30 
measurements using an m x n matrix $ populated with i.i.d. Bernoulli ±1 entries. This is an 
unreasonably small amount of data corresponding to an undersampling factor exceeding 17. For 
reconstruction we consider a time-frequency Gabor dictionary that consists of a variety of sine 
waves modulated by Gaussian windows, with different locations and scales. Overall the dictionary 
is approximately 43 x overcomplete and does not contain the two pulses that comprise xq. 

Figure [TTT b) shows the result of minimizing £1 synthesis (I16p in this redundant dictionary. 
The reconstruction shows pronounced artifacts and ||xo — xH^j/HxH^g ^ 0.67. These artifacts are 
somewhat reduced by analysis-based £1 recovery (flTl) . as demonstrated in Figure fTTT c): here, see 
||xo — xll^j/ll^llfe ~ 0.46. However, reweighting the £1 analysis problem offers a very substantial 
improvement. Figure [TlT d) shows the result after four iterations; ||a;o — a;^^^ H^j/ll^llfe is now about 
0.022. Further, Table [2] shows the relative reconstruction error ||xo — x*-^^ ||£2/||j;||^2 ^ ^ function of 
the iteration count i. Massive gains are achieved after just 4 iterations. 

4.2 Frequency sampling of biomedical images 

Compressed sensing can help reduce the scan time in Magnetic Resonance Imaging (MRI) and offer 
sharper images of living tissues. This is especially important because time consuming MRI scans 
have traditionally limited the use of this sensing modality in important applications. Simply put, 
faster imaging here means novel applications. In MR, one collects information about an object by 
measuring its Fourier coefficients and faster acquisition here means fewer measurements. 

We mimic an MR experiment by taking our unknown image xq to be the n = 256 x 256 = 65536 
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Figure 11: (a) Original two-pulse signal (blue) and reconstructions (red) via (b) £i synthesis, 
(c) £i analysis, (d) reweighted £i analysis, (e) Relative £2 reconstruction error as a function of 
reweighting iteration. 
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pixel MR angiogram image shown in Figure [T2lf a). We sample the image along 80 lines in the Fourier 
domain (see Figure [T2r b)). effectively taking m = 18737 real-valued measurements y = ^xq. In 
plain terms, we undersample by a factor of about 3. 

Figure HZ^c) shows the minimum energy reconstruction which solves 

min||x||£2 subject to y = ^x. (18) 

Figure [T2lfd) shows the result of TV minimization. The minimum £i-analysis ()17p solution where 
^' is a three-scale redundant D4 wavelet dictionary that is 10 times overcomplete, is shown on 
Figure [T2lf e) . Figure [TTl i) shows the result of reweighting the ii analysis with imax = 4 and e 
set to 100. For a point of comparison, the maximum wavelet coefficient has amplitude 4020, and 
approximately 108000 coefficients (out of 655360) have amplitude greater than 100. 

We can reinterpret these results by comparing the reconstruction quality to the best fc-term 
approximation to the image xq in a nonredundant wavelet dictionary. For example, an £2 re- 
construction error equivalent to the £2 reconstruction of Figure [l2jc) would require keeping the 
k = 1905 ~ m/9.84 largest wavelet coefficients from the orthogonal wavelet transform of our test 
image. In this sense, the requisite oversampling factor can be thought of as being 9.84. Of course 
this can be substantially improved by encouraging sparsity, and the factor is reduced to 3.33 using 
TV minimization, to 3.25 using ii analysis, and to 3.01 using reweighted li analysis. 

We would like to be clear about what this means. Consider the image in Figure [T2lfa) and 
its best A;-term wavelet approximation with k = 6225; that is, the approximation obtained by 
computing all the D4 wavelet coefficients and retaining the k largest in the expansion of the object 
(and throwing out the others). Then we have shown that the image obtained by measuring 3k 
real-valued Fourier measurements and solving the iterative reweighted ii analysis has just about 
the same accuracy. That is, the oversampling factor needed to obtain an image of the same quality 
as if one knew ahead of time the locations of the k most significant pieces of information and their 
value, is just 3. 

5 Discussion 

In summary, reweighted li minimization outperforms plain ii minimization in a variety of setups. 
Therefore, this technique might be of interest to researchers in the field of compressed sensing 
and/or statistical estimation as it might help to improve the quality of reconstructions and/or 
estimations. Further, this technique is easy to deploy as (1) it can be built on top of existing £1 
solvers and (2) the number of iterations is typically very low so that the additional computational 
cost is not prohibitive. We conclude this paper by discussing related work and possible future 
directions. 

5.1 Related work 

Whereas we have focused on modifying the ii norm, a number of algorithms been have proposed that 
involve successively reweighting alternative penalty functions. In addition to IRLS (see Section [23]) . 
several such algorithms deserve mention. 

Gorodnitsky and Rao [48] propose FOCUSS as an iterative method for finding sparse solutions 
to underdetermined systems. At each iteration, FOCUSS solves a reweighted £2 minimization with 
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(a) (b) 




(e) (f) 



Figure 12: (a) Original MR angiogram, (b) Fourier sampling pattern, (c) Backprojection, 
PSNR = 29.00dB. (d) Minimum TV reconstruction, PSNR = 34.23dB. (e) h analysis recon- 
struction, PSNR = 34.37dB. (f) Reweighted h analysis reconstruction, PSNR = 34.78dB. 
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weights 




1 



(19) 



(£-1) 



for i = 1,2, . . . ,n. For nonzero signal coefficients, it is sliown that each step of FOCUSS is equivalent 
to a step of the modified Newton's method for minimizing the function 



subject to y = ^x. As the iterations proceed, it is suggested to identify those coefficients apparently 
converging to zero, remove them from subsequent iterations, and constrain them instead to be 
identically zero. 

In a small series of experiments, we have observed that reweighted £i minimization recovers 
sparse signals with lower error (or from fewer measurements) than the FOCUSS algorithm. We 
attribute this fact, for one, to the natural tendency of unweighted ii minimization to encourage 
sparsity (while unweighted £2 minimization does not). We have also experimented with an e- 
regularization to the reweighting function (|19p that is analogous to ([6]) . However we have found that 
this formulation fails to encourage strictly sparse solutions. (Sparse solutions can be encouraged by 
letting e ^ as the iterations proceed, but the overall performance remains inferior to reweighted 
ii minimization with fixed e.) 

Harikumar and Bresler [49] propose an iterative algorithm that can be viewed as a generalization 
of FOCUSS. At each stage, the algorithm solves a convex optimization problem with a reweighted 
£2 cost function that encourages sparse solutions. The algorithm allows for different reweighting 
rules; for a given choice of reweighting rule, the algorithm converges to a local minimum of some 
concave objective function (analogous to the log-sum penalty function in ([71)). These methods build 
upon £2 minimization rather than ii minimization. 

Delaney and Bresler [50] also propose a general algorithm for minimizing functionals having 
concave regularization penalties, again by solving a sequence of reweighted convex optimization 
problems (though not necessarily £2 problems) with weights that decrease as a function of the prior 
estimate. With the particular choice of a log-sum regularization penalty, the algorithm resembles 
the noise-aware reweighted £1 minimization discussed in Section [3.31 

Finally, in a slightly different vein, Chartrand [45] has recently proposed an iterative algorithm 
to minimize the concave objective \\x\\£^ with p < 1. (The algorithm alternates between gradient 
descent and projection onto the constraint set y = ^x.) While a global optimum cannot be 
guaranteed, experiments suggests that a local minimum may be found — when initializing with the 
minimum £2 solution — that is often quite sparse. This algorithm seems to outperform (Pi) in a 
number of instances and offers further support for the utility of nonconvex penalties in sparse signal 
recovery. To reiterate, a major advantage of reweighted £1 minimization in this thrust is that (1) 
it can be implemented in a variety of settings (see Sections [3] and H]) on top of existing and mature 
linear programming solvers and (2) it typically converges in very few steps. The log-sum penalty 
is also more ^o-like and as we discuss in Section 12.41 additional concave penalty functions can be 
considered simply by adapting the reweighting rule. 

5.2 Future directions 

In light of the promise of reweighted £1 minimization, it seems desirable to further investigate the 
properties of this algorithm. 
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• Under what conditions does the algorithm converge? That is, when do the successive iterates 
x^^-* have a hmit x^°°^? 

• As shown in Section [2l when there is a sparse solution and the reweighted algorithm finds 
it, convergence may occur in just very few steps. It would be of interest to understand this 
phenomenon more precisely. 

• What are smart and robust rules for selecting the parameter e? That is, rules that would 
automatically adapt to the dynamic range and the sparsity of the object under study as to 
ensure reliable performance across a broad array of signals. Of interest are ways of updating 
e as the algorithm progresses towards a solution. Of course, e does not need to be uniform 
across all coordinates. 

• We mentioned the use of other functionals and reweighting rules. How do they compare? 

• Finally, any result quantifying the improvement of the reweighted algorithm for special classes 
of sparse or nearly sparse signals would be significant. 
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